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Abstract 

We consider the problem of reconstructing a sparse signal x° £ M" from a limited number of 
linear measurements. Given m randomly selected samples of Ux°, where U is an orthonormal 
matrix, we show that l\ minimization recovers x° exactly when the number of measurements 
exceeds 

m > Const • fi 2 (U) ■ S ■ logn, 

where S is the number of nonzero components in a; , and /i is the largest entry in U properly 
normalized: fi(U) = ^fn ■ max^j |ffcj|- The smaller [i, the fewer samples needed. 

The result holds for "most" sparse signals x° supported on a fixed (but arbitrary) set T. 
Given T, if the sign of x° for each nonzero entry on T and the observed values of Ux° are drawn 
at random, the signal is recovered with overwhelming probability. Moreover, there is a sense 
in which this is nearly optimal since any method succeeding with the same probability would 
require just about this many samples. 
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1 Introduction 

1.1 Sparse recovery from partial measurements 

This paper addresses the problem of signal acquisition in a broad setting. We are interested in 
"sampling" a vector x° £ R n . Instead of observing x° directly, we sample a small number m of 



1 



transform coefficients of x . For an orthogonal matri^ U with 

U*U = nI, (1.1) 

these transform coefficients are given by y° = Ux°. Of course, if all n of the coefficients y° are 
observed, recovering x° is trivial: we simply apply - U* to the vector of observations y°. Instead, 
we are concerned with the highly underdetermined case in which only a small fraction of the 
components of y° are actually sampled or observed. Given a subset Q C {1, . . . , n} of size |0| = m, 
the challenge is to infer the "long" n-dimensional vector x° from the "short" m-dimensional vector 
of observations y = Uqx°, where Lfo is the m x n matrix consisting of the rows of U indexed by Q. 
In plain English, we wish to solve a system of linear equations in which there are fewer equations 
than unknowns. 

A special instance of this problem was investigated in a recent paper [4], where U is taken as the 
usual discrete Fourier transform. The main result of this work is that if x° is S-sparse (at most 
S of the n components of x° are nonzero), then it can be recovered perfectly from on the order 
of Slogn Fourier-domain samples. The recovery algorithm is concrete and tractable: given the 
discrete Fourier coefficients 

n 

yk = J2x°(t)e- i27r{t - 1)k/n , kefl, (1.2) 
t=i 

or y = Fqx° for short, we solve the convex optimization program 

min II^Hij subject to Fqx = y. 

X 

For a fixed x°, the recovery is exact for the overwhelming majority of sample sets Q of size 

\Q\ > C-S-logn, (1.3) 

where C is a known (small) constant. 

Since [4], a theory of "compressed sensing" has developed around several papers [6,7,9] demon- 
strating the effectiveness of l\ minimization for recovering sparse signals from a limited number 
of measurements. To date, most of this effort has been focused on systems which take completely 
unstructured, noise-like measurements, i.e. the observation vector y is created from a series of inner 
products against random test vectors {4>k}'- 

yk = (4>k,x°), k = l,...,m. (1.4) 

The collection is sometimes referred to as a measurement ensemble; we can write (jl.4p com- 
pactly as y = where the rows of $ are the Published results take cpk to be a realization 
of Gaussian white noise, or a sequence of Bernoulli random variables taking values ±1 with equal 
probability. This work has shown that taking random measurements is in some sense an optimal 
strategy for acquiring sparse signals; it requires a near-minimal number of measurements [1,6,7,9,10] 

1 On a first reading, our choice of normalization of U may seem a bit strange. The advantages of taking the row 
vectors of U to have Euclidean norm ^fn are that 1) the notation in the sequel will be cleaner, and 2) it will be easier 
to see how this result generalizes the special case of incomplete sampling in the Fourier domain presented in [4]. 
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- m measurements can recover signals with sparsity S < m/ login/ m), and all of the constants 
appearing in the analysis are small [13]. Similar bounds have also appeared using greedy [28] and 
complexity-based [17] recovery algorithms in place of l\ minimization. 

Although theoretically powerful, the practical relevance of results for completely random measure- 
ments is limited in two ways. The first is that we are not always at liberty to choose the types 
of measurements we use to acquire a signal. For example, in magnetic resonance imaging (MRI), 
subtle physical properties of nuclei are exploited to collect samples in the Fourier domain of a 
two- or three-dimensional object of interest. While we have control over which Fourier coefficients 
are sampled, the measurements are inherently frequency based. A similar statement can be made 
about tomographic imaging; the machinery in place measures Radon slices, and these are what we 
must use to reconstruct an image. 

The second drawback to completely unstructured measurement systems is computational. Random 
(i.e. unstructured) measurement ensembles are unwieldy numerically; for large values of m and n, 
simply storing or applying <J> (tasks which are necessary to solve the l\ minimization program) are 
nearly impossible. If, for example, we want to reconstruct a megapixel image (n = 1, 000, 000) from 
m = 25,000 measurements (see the numerical experiment in Section [2|), we would need more than 
3 gigabytes of memory just to store the measurement matrix, and on the order of gigaflops to apply 
it. The goal from this point of view, then, is to have similar recovery bounds for measurement 
matrices <E> which can be applied quickly (in 0(n) or 0(n log n) time) and implicitly (allowing us 
to use a "matrix free" recovery algorithm). 

Our main theorem, stated precisely in Section 11.21 and proven in Section [3l states that bounds 
analogous to fjl .31) hold for sampling with general orthogonal systems. We will show that for a fixed 
signal support T of size \T\ = S, the program 

min II^Hij subject to Uqx = Uqx° (1-5) 

X 

recovers the overwhelming majority of x° supported on T and observation subsets O of size 

\n\>C-fi 2 {U)-S-logn, (1.6) 
where fJ>(U) is simply the largest magnitude among the entries in U: 

fj,(U) = max |f7fcj|. (1-7) 

It is important to understand the relevance of the parameter fJ,(U) in (|1.6p . fJ>(U) can be interpreted 
as a rough measure of how concentrated the rows of U are. Since each row (or column) of U 
necessarily has an ^-riorm equal to y/n, jj, will take a value between 1 and y/n. When the rows 
of U are perfectly flat - - \Ukj\ = 1 for each k,j, as in the case when U is the discrete Fourier 
transform, we will have n{U) = 1, and (jl.6p is essentially as good as (jl.3p . If a row of U is 
maximally concentrated — all the row entries but one vanish — then fi 2 (U) = n, and (11.60 offers 
us no guarantees for recovery from a limited number of samples. This result is very intuitive. 
Suppose indeed that Uk j = y/n and x° is 1-sparse with a nonzero entry in the joth location. To 
reconstruct x°, we need to observe the fcoth entry of U x° as otherwise, the data vector y will vanish. 
In other words, to reconstruct x° with probability greater than 1 — 1/n, we will need to see all the 
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components of Ux°, which is just about the content of (jl.6p . This shows informally that (jl.6p is 
fairly tight on both ends of the range of the parameter [i. 

For a particular application, U can be decomposed as a product of a sparsity basis ^, and an 
orthogonal measurement system <I>. Suppose for instance that we wish to recover a signal / 6 R™ 
from m measurements of the form y = The signal may not be sparse in the time domain but 
its expansion in the basis may be 

n 

/(t) = J>$^-(t), / = 

3=1 

(the columns of are the discrete waveforms tpj). Our program searches for the coefficient sequence 
in the ^-domain with minimum l\ norm that explains the samples in the measurement domain <3?. 
In short, it solves (jl,6p with 

U = $V, = $*$ = nl. 

The result (jl.6p then tells us how the relationship between the sensing modality (<]?) and signal model 
(ty) affects the number of measurements required to reconstruct a sparse signal. The parameter fi 
can be rewritten as 

fj,(m) =max|((^,^>|, 
k,3 

and serves as a rough characterization of the degree of similarity between the sparsity and mea- 
surement systems. For [i to be close to its minimum value of 1, each of the measurement vectors 
(rows of $>) must be "spread out" in the \I/ domain. To emphasize this relationship, n(U) is often 
referred to as the mutual coherence [11,12]. The bound (jl.6p tells us that an 5-sparse signal can 
be reconstructed from ~ S log n samples in any domain in which the test vectors are "fiat" , i.e. the 
coherence parameter is 0(1). 

1.2 Main result 

The ability of the ^-minimization program (jl.5p to recover a given signal x° depends only on 1) 
the support set T of x°, and 2) the sign sequence zq of x° on To For a fixed support T, our main 
theorem shows that perfect recovery is achieved for the overwhelming majority of the combinations 
of sign sequences on T, and sample locations (in the U domain) of size m obeying (|1.6p . 

The language "overwhelming majority" is made precise by introducing a probability model on the 
set and the sign sequence z. The model is simple: select uniformly at random from the set of 
all subsets of the given size m; choose each z(t), t £ T to be ±1 with probability 1/2. Our main 
result is: 

Theorem 1.1 Let U be an n x n orthogonal matrix (U*U = nl) with \Ujej\ < /u(C/). Fix a subset 
T of the signal domain. Choose a subset ft of the measurement domain of size = m, and a sign 
sequence z on T uniformly at random. Suppose that 

m>C -\T\-n 2 {U)-log(n/5) (1.8) 

In other words, the recoverability of x° is determined by the facet of the i\ ball of radius || a; 1 1 £ x on which x° 
resides. 
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and also m > Cq • log 2 (n/5) for some fixed numerical constants Co and Cq. Then with probability 
exceeding 1—5, every signal x° supported on T with signs matching z can be recovered from y = C/qx° 
by solving (jl,5p . 



The hinge of Theorem II, II is a new weak uncertainty principle for general orthobases. Given T and 
Q as above, it is impossible to find a signal which is concentrated on T and on f2 in the U domain. 
In the example above where U = $ V I / , this says that one cannot be concentrated on small sets in 
the \& and <£ domains simultaneously. As noted in previous publications [3,4], this is a statement 
about the eigenvalues of minors of the matrix U. Let Ut be the n x |T| matrix corresponding to 
the columns of U indexed by T, and let Uqt be the m x |T| matrix corresponding to the rows of 
Ut indexed by O. In Section [31 we will prove the following: 

Theorem 1.2 Let U,T,and f2 be as in Theorem \l.l\ Suppose that the number of measurements m 
obeys 

m > \T\ -fi 2 (U) •max(C 1 log|T|, C 2 log(3/<5)), (1.9) 
for some positive constants C±,C2- Then 

P (w-UfaUvr ~ I\\ > 1/2") < 5, (1.10) 
\ m J 

where || • || is the standard operator £2 norm — here, the largest eigenvalue (in absolute value). 

For small values of 5, the eigenvalues of Uq T Ucit are all close to m with high probability. To see 
that this is an uncertainty principle, let x £ M n be a sequence supported on T, and suppose that 
Wm^U^UnT ~ I\\ < 1/2. It follows that 



m n 2 3m 

ylMI^ 2 < ||tfax||/ a < — 



x\\l < \\U n x\\l <— \\x\\l, (1.11) 



which asserts that only a small portion of the energy of x will be concentrated on the set f2 in 
the [/-domain (the total energy obeys [|C/x[|^ = n||a;||^ 2 ). Moreover, this portion is essentially 
proportional to the size of Q. 



1.3 Contributions and relationship to prior work 

The relationship of the mutual incoherence parameter fi to the performance of i\ minimization 
programs with equality constraints first appeared in the context of Basis Pursuit for sparse approx- 
imation, see [12] and also [11,14,16]. 

As mentioned in the previous section, [4] demonstrated the effectiveness of t\ recovery from Fourier- 
domain samples in slightly more general situations than in Theorem 11.11 (randomization of the signs 
on T is not required) . Obviously, the results presented in this paper considerably extend this Fourier 
sampling theorem. 

We also note that since [4], several papers have appeared on using i\ minimization to recover sparse 
signals from a limited number of measurements [5,7,9]. In particular [7] and [25] provide bounds 
for reconstruction from a random subset of measurements selected from an orthogonal basis; these 
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papers ask that all sparse signals to be simultaneously recoverable from the same set of samples 
(which is stronger than our goal here), and their bounds have log factors of (logn) 6 and (logn) 5 
respectively. These results are based on uniform uncertainty principles, which require (jl.lOp to 
hold for all sets T of a certain size simultaneously once O is chosen. Whether or not this log power 
can be reduced in this context remains an open question. 

A contribution of this paper is to show that if one is only interested in the recovery of nearly all 
signals on a fixed set T, these extra log factors can indeed be removed. We show that to guarantee 
exact recovery, we only require C/qt to be well behaved for this fixed T as opposed to all T's of the 
same size, which is a significantly weaker requirement. By examining the singular values of Uqt, 
one can check whether or not (jl.lip holds. 

Our method of proof, as the reader will see in Section [3l relies on a variation of the powerful results 
presented in [24] about the expected spectral norm of certain random matrices. We also introduce 
a novel large-deviation inequality, similar in spirit to those reviewed in [19,20] but carefully tailored 
for our purposes, to turn this statement about expectation into one about high probability. 

Finally, we would like to contrast this work with [29], which also draws on the results from [24]. 
First, there is a difference in how the problem is framed. In [29], the mx n measurement system is 
fixed, and bounds for perfect recovery are derived when the support and sign sequence are chosen 
at random, i.e. a fixed measurement system works for most signal supports of a certain size. In 
this paper, we fix an arbitrary signal support, and show that we will be able to recover from 
most sets of measurements of a certain size in a fixed domain. Second, although slightly more 
general class of measurement systems is considered in [29], the final bounds for sparse recovery in 
the context of (II. 5ft do not fundamentally improve on the uniform bounds cited above; [29] draws 
weaker conclusions since the results are not shown to be universal in the sense that all sparse signals 
are recovered as in [7] and [25]. 

2 Applications 

In the 1990s, image compression algorithms were revolutionized by the introduction of the wavelet 
transform. The reasons for this can be summarized with two major points: the wavelet trans- 
form is a much sparser representation for photograph-like images than traditional Fourier-based 
representations, and it can be applied and inverted in 0{n) computations. 

To exploit this wavelet-domain sparsity in acquisition, we must have a measurement system which 
is incoherent with the wavelet representation (so that fx in (jl.6p is small) and that can be applied 
quickly and implicitly (so that large-scale recovery is computationally feasible). In this section, we 
present numerical experiments for two such measurement strategies. 

2.1 Fourier sampling of sparse wavelet subbands 

Our first measurement strategy takes advantage of the fact that at fine scales, wavelets are very 
much spread out in frequency. We will illustrate this in ID; the ideas are readily applied to 2D 
image. 
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Figure 1: Wavelets in the frequency domain. The curves shown above are the magnitude of the discrete 
Fourier transform (|2.ip of Daubechies-8 wavelets for n — 1024 and j = 1, 2, 3. The magnitude of tfjj . over 
the subband (|2.3p is shown in bold. 

Labeling the scales of the wavelet transform by j = 1, 2, . . . , J, where j = 1 is the finest scale and 
j = J the coarsest, the wavelet^f] t/)j k at scale j are almost flat in the Fourier domain over a band 
of size rij = n2~ J . The magnitude of the Fourier transform 

n 

4 fc (u,) = J2 me- iMt - 1)uj/n , u = -n/2 + 1, . . . , n/2, (2.1) 
t=i 

is the same for each wavelet at scale j, since 

4 fc (w) = e-^^- 1 )^^, i(w). (2.2) 

These spectrum magnitudes are shown for the Daubechies-8 wavelet in Figure [TJ We see that over 
frequencies in the jth. subband 

u 6 Bj := {rij/2 + l,...,rij}U {-rij + 1, . . . , -rij/2}, (2.3) 

we have 

maX ^'^ HI < Const «V2. 



min^e^. \ip j: k(u)\ 

Suppose now that a signal x° is a superposition of S wavelets at scale j, that is, we can write 

x ° = y jW ° 

where w° E M n -? is S-sparse, and \F,- is the n x rij matrix whose columns are the ipj k{t) f° r 
k = 1, . . . , rij. We will measure x° by selecting Fourier coefficients from the band Bj at random. To 
see how this scheme fits into the domain of the results in the introduction, let uj index the subband 
Bj, let Fj be the rij x n matrix whose rows are the Fourier vectors for frequencies in Bj, let Dj be 
a diagonal matrix with 

(Dj)u, u = ipj,i(u), to £ Bj, 



3 Wavelets are naturally parameterized by a scale j and a shift k with k = 1, 2, . . . , n2 J — see [21]. The wavelets 
at a set scale are just circular shifts of one another: tpj,k(t) = ~ 2 J fc), where the substraction is modulo n. 
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Table 1: Number of measurements required to reconstruct a sparse subband. Here, n — 1024, S is the 
sparsity of the subband, and M(S,j) is the smallest number of measurements so that the S-sparse subband 
at wavelet level j was recovered perfectly in 1000/1000 trials. 
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M(S,j) 


S 


M(S,j) 
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25 
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35 


25 
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15 


40 
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24 


15 


49 
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27 







and consider the rij x rij system 

The columns of Fj^j are just the Fourier transforms of the wavelets given in (|2.2|) . 

and so U is just a rij x rij Fourier system. In fact, one can easily check that U*U = U*U = rij I. 
We choose a set of Fouier coefficients O of size m in the band Bj , and measure 

V = FqXq = Fn^fjW , 

which can easily be turned into a set of samples in the U domain y 1 = Uqw just by re-weighting 
y. Since the mutual incoherence of D~ l Fj i &j is yu = 1, we can recover w° from ~ Slogn samples. 

Table [U summarizes the results of the following experiment: Fix the scale j, sparsity S, and a 
number of measurements m. Perform a trial for (S,j,m) by first generating a signal support T of 
size S, a sign sequence on that support, and a measurement set Qj of size m uniformly at random, 
and then measuring y = F^.^jX (x° is just the sign sequence on T and zero elsewhere), solving 
(|1.5|) . and declaring success if the solution matches x°. A thousand trials were performed for each 
(S, j, m). The value M(S,j) recorded in the table is the smallest value of m such that the recovery 
was successful in all 1000 trials. As with the partial Fourier ensemble (see the numerical results 
in [4]), we can recover from m ~ 2S to 3S measurements. 

To use the above results in an imaging system, we would first separate the signal/image into wavelet 
subband, measure Fourier coefficients in each subband as above, then reconstruct each subband 
independently. In other words, if Pwj is the projection operator onto the space spanned by the 
columns of ^j, we measure 

y j = F Qj P Wj x° 

for j = 1, . . . , J, then set v>> to be the solution to 

min H^H^ subject to Ffi j ^jW = y 3 . 

If all of the wavelet subbands of the object we are imaging are appropriately sparse, we will be able 
to recover the image perfectly. 

Finally, we would like to note that this projection onto Wj in the measurement process can be 
avoided by constructing the wavelet and sampling systems a little more carefully. In [12], a "bi- 
sinusoidal" measurement system is introduced which complements the orthonormal Meyer wavelet 
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transform. These bi-sinusoids are an alternative orthobasis to the Wj spanned by Meyer wavelets 
at a given scale (with perfect mutual incoherence), so sampling in the bi-sinusoidal basis isolates a 
given wavelet subband automatically. 

In the next section, we examine an orthogonal measurement system which allows us to forgo this 
subband separation all together. 

2.2 Noiselet measurements 

In [8], a complex "noiselet" system is constructed that is perfectly incoherent with the Haar wavelet 
representation. If \E' is an orthonormal system of Haar wavelets, and <3? is the orthogonal noiselet 
system (renormalized so that = nl), then U = has entries of constant magnitude: 

\Uk,j\ = 1) Vfc,j which implies /j>(U) = 1. 

Just as the canonical basis is maximally incoherent with the Fourier basis, so is the noiselet system 
with Haar wavelets. Thus if an n-pixel image is S-sparse in the Haar wavelet domain, it can be 
recovered (with high probability) from ~ Slogn randomly selected noiselet coefficients. 

In addition to perfect incoherence with the Haar transform, noiselets have two additional properties 
that make them ideal for coded image acquisition: 

1. The noiselet matrix $ can be decomposed as a multiscale filterbank. As a result, it can be 
applied 0(n log n) time. 

2. The real and imaginary parts of each noiselet function are binary valued. A noiselet mea- 
surement of an image is just an inner product with a sign pattern, which make their imple- 
mentation in an actual acquisition system easier. (It would be straightforward to use them 
in the imaging architecture proposed in [26], for example.) 

A large-scale numerical example is shown in Figure [2j The n = 1024 2 pixel synthetic image in 
panel (a) is an exact superposition of S = 25, 000 Haar wavelets^). The observation vector y was 
created from m = 70, 000 randomly chosen noiselet coefficients (each noiselet coefficient has a real 
and imaginary part, so there are really 140,000 real numbers recorded). From y, we are able to 
recover the image exactly by solving (jl.5p . 

This result is a nice demonstration of the compressed sensing paradigm. A traditional acquisition 
process would measure all n ~ 10 6 pixels, transform into the wavelet domain, and record the S that 
are important. Many measurements are made, but comparably very few numbers are recorded. Here 
we take only a fraction of the number of measurements, and are able to find the S active wavelets 
coefficients without any prior knowledge of their locations. 

The measurement process can be adjusted slightly in a practical setting. We know that almost all 
of the coarse-scale wavelet coefficients will be important (see Figure [2(b)), so we can potentially 

4 The image was created in the obvious way: the well-known test image was transformed into the Haar domain, 
all but the 25, 000 largest Haar coefficients were set to zero, and the result inverse transformed back into the spatial 
domain. 
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(a) (b) (c) 

Figure 2: Sparse image recovery from noiselet measurements, (a) Synthetic n = 1024 2 -pixel image 
with S = 25, 000 non-zero Haar wavelet coefficients, (b) Locations (in the wavelet quadtree) of sig- 
nificant wavelet coefficients, (c) Image recovered from m = 70, 000 complex noiselet measurements. 
The recovery matches (a) exactly. 

reduce the number of measurements needed for perfect recovery by measuring these directly. In 
fact, if we measure the 128 x 128 block of coarse wavelet coefficients for the image in Figure [2] 
directly (equivalent to measuring averages over 8x8 blocks of pixels, 16,384 measurement total), 
we are able to recover the image perfectly from an additional 41, 808 complex noiselet measurements 
(the total number of real numbers recorded is 100,000). 

3 Proofs 

3.1 General strategy 

The proof of Theorem 11.11 follows the program set forth in [4,15]. As detailed in these references, 
the signal x° is the unique solution to (|1.5p if and only if there exists a dual vector it G M. n with 
the following properties: 

• 7r is in the row space of Uq, 

• ir(t) = sgnx°(t) for t G T, and 

• |vr(t)| < 1 for t G T c . 

We consider the candidate 

7T = UZUariUfaUar)- 1 *, (3.1) 

where zq is a \T\ -dimensional vector whose entries are the signs of x° on T, and show that under 
the conditions in the theorem 1) ir is well defined (i.e. U^ t Uq,t is invertible), and given this 2) 
| vr(i ) | < 1 on T c (we automatically have that tt is in the row space of Uq and 7r(i) = sgnx(t) on T). 
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We want to show that with the support fixed, a dual vector exists with high probability when 
selecting Q uniformly at random. Following [4], it is enough to show that the desired properties 
when f2 is sampled using a Bernoulli model. Suppose f^i of size m is sampled uniformly at random, 
and is sampled by setting 

n 2 := {k : 5 k = 1}, 

where here and below 6\, 62, ■ ■ ■ , S n is a sequence of independent identically distributed 0/1 Bernoulli 
random variables with 

P(5k = 1) = m/n. 

Then 

P(Failure(fii)) < 2P(Failure(ft 2 )) (3.2) 

(see [4] for details). With this established, we will establish the existence of a dual vector for x° 
with high probability for f2 sampled using the Bernoulli model. 

The matrix Uq T Uqt is now a random variable, which can be written as 

n 

U^ T U nT = ^5 k u k ®u k , 

k=l 

where the u k are the row vectors of Ut, u k = {U ti k)te.T- 



3.2 Proof of Theorem CC2 



Our first result, which is an analog to a theorem of Rudelson [24, Th. 1], states that if m is large 
enough, then on average the matrix m^ 1 UQ T UnT deviates little from the identity. 

Theorem 3.1 Let U be an orthogonal matrix obeying (jl.ip . Consider a fixed set T and let Q be a 
random set sampled using the Bernoulli model. Then 

E W-U^Um - /|| < C R ■ max ||n fc || (3.3) 

m \/m Kk<n 



for some positive constant Cr, provided the right-hand side is less than 1. Since the coherence ^(U) 
obeys 



this implies 



max ||« fc || < u(U)J\T\ 

Kk<n 



E W-U&rUar ~ I\\ < C R ■ n(U) ^ |T| ^_ g|T| , (3.4) 



The probabilistic model is different here than in [24]. The argument, however, is similar. 
Proof We are interested in E ||y|| where Y is the random sum 

1 n 

Y = —V 5 k u k ®u k - I. 
m z — ' 

k=l 
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Note that since U*U = nl, 

1 U Tfl 1 n 

EY = — V ' — u k <g> u k - I = - V u k ® u k - I = 0. 



m ' — ' n 

k=i 



k=i 



We now use a symmetrization technique to bound the expected value of the norm of Y. We let Y' 
be an independent copy of Y, i.e. 



1 n 

Y' = -Y5' k u k ®u k -I, 
m f-^ k 



(3.5) 



fc=i 



where <5^, . . . , 8' n are independent copies of 8±, . . . , <5 n , and write 

E||y|| <E||r-y'||, 

which follows from Jensen's inequality and the law of iterated expectation (also known as Fubini's 
theorem). Now let ei, . . . , e n be a sequence of Bernoulli variables taking values ±1 with probability 
1/2 (and independent of the sequences 5 and 8'). We have 



1 n 

E\\Y\\ <B s , s ,\\-y2(5 k -5' k )u k ®u k 

k=l 

= E e E 5j5 /||— e k($k ~ 8'k) 

l<k<n 

<2E € Es\\- V e k 5 k 
m 



u k <g> u k \ 



u k <g> u k \ 



Kk<n 



the first equality follows from the symmetry of the random variable (8 k — 8' k )u k ®u k while the last 
inequality follows from the triangle inequality. 

We may know apply Rudelson's powerful lemma [24] which states that 



E e || ^e k 8 k u k ®u k \\ < C R /A ■ ydog jrj • max ||n fe || ■ || ^ 
k=l \ k=l 



8 k u k ® ti fc| 



(3.6) 



for some universal constant Cr > (the notation should make it clear that the left-hand side is 
only averaged over e). Taking expectation over 8 then gives 



E ||y|| < Cr/2 ■ ^ loS ^ ■ max \\u k \\ ■ E 



m Kk<n 



\ 



5> 

k=l 



u k u k \ 



< C R /2 ■ 



y / ^g\T\ 



max u 



m Kk<n 



, E||$> 

\ k=i 



u k ®u k \ 



(3.7) 
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where the second inequality uses the fact that for a nonnegative random variable Z, E \[Z < \fE~Z. 
Observe now that 

n 

E ii Yl 5kyk ® uk w = E ii my + m/ n - m ( E n y ii + 

k=l 

and, therefore, (13. 7p gives 



E||F|| <a- -v/E||y|| + l, a = C R /2- ^ lo f^^ . max ||u fc ||. 

yTn l<fc<n 

It then follows that if o < 1, 

E||Y|| < 2a, 

which concludes the proof of the theorem. ■ 

With Theorem 13.11 established, we have a bound on the expected value of \\m~ 1 UQ T UnT — 
Theorem 11.21 shows that m~ 1 UQ T UfiT is close to the identity with high probability, turning the 
statement about expectation into a corresponding large deviation result. 

The proof of Theorem 1 1 . 2 1 uses remarkable estimates about the large deviations of suprema of sums 
of independent random variables. Let Y±, . . . , Y n be a sequence of independent random variables 
taking values in a Banach space and let Z be the supremum defined as 

n 

Z = supV/(F i ), (3.8) 

where J- is a countable family of real-valued functions. In a striking paper, Talagrand [27] proved 
a concentration inequality about Z which is stated below, see also [19] [Corollary 7.8]. 

Theorem 3.2 Assume that \f\ < B for every f in T, and E/(l^) = for every f in T and 
i = 1, ... ,n. Then for all t > 0, 

P(|Z - BZ| > t) < 3eX p (-^ log (l + ^§^)) , (3.9) 

where a 2 = supjg^ Y17=l E / 2 (Y); Z = su P/e:F I zC?=i /0^)l; an d K is a numerical constant. 

We note that very precise values of the numerical constant K are known and are small, see [22] 
and [18,23]. 

Proof of Theorem [O Set Y to be the matrix ^U^ T U nT -I and recall that £ Y2=i u k ®u k = I, 
which allows to express Y as 

z— ' V n. / m L — * 

where 



n / m 
fc=i fc=i 



lit := <5fc 



n / m 
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Note that EY^ = 0. We are interested in the spectral norm \\Y\\. By definition, 

n 

\\Y\\ = sup (f 1: Yf 2 ) = sup J2(fx,Y k f 2 ), 

where the supremum is over a countable collection of unit vectors. For a fixed pair of unit vectors 
(/i)/2) 5 let f(Yk) denote the mapping (fi,Y k f2}- Since E/(Yfe) = 0, we can apply Theorem 13.21 
with B obeying 

\f(Y k )\<^ uk){ukj ^<^l< B , foraUfc . 
m m 

As such, we can take B = maxi<fc<„ ||ti fc || 2 /m. We now compute 

Bf 2 (Y k ) = — (l- —^ K-ft'"*) < u *»/2>l 2 
n V n 



k\\2 

n \ n J m? 



m f m\ u H fc 2 



Since Xli<fc<n K ufc ' /s)| 2 = we proved that 

l<fc<n _ _ 

In conclusion, with Z = \\Y\\ = Z, Theorem 13.21 shows that 

P(| im|-E||Y|| I >t) < 3exp (-j^glog ( 1+ 1 + E||y|| ))- (3 - 10) 

Take m large enough so that E||F|| < 1/4 in (j3.4j) . and pick i = 1/4. Since 5 < /i 2 (C/)|T|/m, 
(T3TT0T) gives 

P(||Y|| > 1/2) < 3e c T^(f)ITI , 
for C T = AK/ log(6/5). Taking d = 1&C R and C 2 = C T finishes the proof. ■ 



3.3 Proof of Theorem fTTTI 

With Theorem 11.21 established, we know that with high probability the eigenvalues of Uq T Uqt will 
be tightly controlled — they are all between m/2 and 3m/2. Under these conditions, the inverse 
of (Uq T Ucit) not only exists, but we can guarantee that ||(C^r%T) _1 || < 2/m, a fact which we 
will use to show |vr(t)| < 1 for t € T c . 

For a particular to 6 T c , we can rewrite 7r(to) as 

vr(t ) = («°, (E^r^nT)" 1 *) = (™V>, 

where v° is the row vector of UqUqt with row index to, and w° = (C/^C/qt) -1 ^. The following 
three lemmas give estimates for the sizes of these vectors. From now on and for simplicity, we drop 
the dependence on U in //(£/)• 
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Lemma 3.1 The second moment of Zq := \\v°\\ obeys 

EZ$<u 2 m\T\. (3.11) 

Proof Set A° = u k ,t - The vector v° is given by 

n n 

v° = }Z ^ A ° Uk = - E A ° 
fc=i fc=i 

where the second equality holds due to the orthogonality of the rows of U: J2i<k<n ^k u k,t = 
J2i<k<n u k,t Uk,t = 0. We thus can view v° as a sum of independent random variables: 

n 

v° = J2 Y k, Y k = (h ~ m/n)\lu k , (3.12) 
k=l 

where we note that E Y k = 0. It follows that 

EZ 2 = J]E(Y fc ,Y fc > + E(n,y fc .) = ^E(Y fe ,Y fe ). 

Now 

E \\Y k f = ™(l-™) \M\W < - (l - -) |A° fc |V 2 |T|. 
Since |A°| 2 = n, we proved that 

EZ 2 < (l - — ) A* 2 m|T|. 
This establishes the claim. ■ 
The next result shows that the tail of Zq exhibits a Gaussian behavior. 

Lemma 3.2 Fix to £ a^d ^ -^o = ||^°||- Define a as 

a 2 = n 2 m ■ max(l, u\T\/y/m). 
Fix a > obeying a < (m/^ 2 ) 1 / 4 if fj,\T\/^/m > 1 and a < (m/^ 2 ^!) 1 / 2 otherwise. Then 



P{Z > Hy/m\T\ + aa) < e" 7a , (3.13) 

for some positive constant 7 > 0. 



The proof of this lemma uses the powerful concentration inequality (j 3 . 9 1) . 
Proof By definition, Zq is given by 

n 

Z = sup (v°,f}= sup y^(Y k ,f) 
||/||=i ||/||=i 1% 
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(and observe Zq = Zq). For a fixed unit vector /, let f(Y k ) denote the mapping (Y k ,f). Since 
= Oj we can a Pply Theorem 13.21 with B obeying 

\f(Y k )\ < \X° k \ \(f,u k )\ < \X° k \ \\u k \\ < f IT) 1 / 2 := B. 

Before we do this, we also need bounds on a 2 and EZq. For the latter, we simply use 



E < y E Zq < /x \fm\f~\. (3.14) 

For the former 

Ef 2 (Y k ) = ™ (l - ™) \Xl\ 2 \(u\ /)| 2 < ™ (l - ™) » 2 \{u\f)\ 2 . 
Since Xa<fc<n K nfc > /)P = n ' we proved that 

^ E/ 2 (Y fc )<m M 2 (l 

l<fe<ra 

In conclusion, Theorem 13=21 shows that 



m 
n 



P(|Z -EZ | >t) < 3exp(--?-log(l + - ) ) . (3.15) 



Suppose now a 2 = B n^f m\T\ > fi 2 m, and fix i = a<j. Then it follows from (|3.15p that 

P(|Z -EZ | >t) < 3e~ 7a2 , 

provided that Bt <a 2 . The same is true if a 2 = fi 2 m > Bii^Jrn)(T\ and .Bt < /u 2 m. We omit the 
details. The lemma follows from (|3.14|) . ■ 



Lemma 3.3 Let w° = (U^Uqt^v . With the same notations and hypotheses as in Lemma \3. "A 
we have 

P [ sup ||u; || > 2fi y/\T\/m + 2da/m) <iie" 7 " 2 + P(||[/^ T % T || < m/2). (3.16) 



Proof Let A and B be the events {II^qt^'HI > m/2} and {sup t()6T c ||u°|| < /Uy 7 m ||T| + aa} 
respectively, and observe that Lemma 13.21 gives P(B C ) < ne" 7a . On the event A n B 



sup ||«r|| < — (/i-y/ m |T| + a a) 



The claim follows. 
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Lemma 3.4 Assume that z(t), t £ T is an i.i.d. sequence of symmetric Bernoulli random variables. 
For each A > 0, we have 

P (sup > 1 ) < 2ne~ 1/2A2 + P ( sup ||u; || > A ) . (3.17) 

Proof The proof is essentially an application of Hoeffding's inequality [2]. Conditioned on the w°, 
this inequality states that 

P (\(w°,z)\ > 1 | u>°) < 2e"^V. (3.18) 
Recall that 7r(io) = (w°,z). It then follows that 



P ( sup |vr(t ) | > 1 I sup ||ur|| < A ) < 2ne 
a eT c t GT c 



1 



which proves the result. 



The pieces are in place to prove Theorem ll.il Set A = 2/i \J\T\/m+2ao /m. Combining Lemmas l3.4l 
and 13.31 we have for each a > obeying the hypothesis of Lemma 13.21 

P ( sup \7r(t)\ > l) < 2ne- 1 / 2A2 + ne"^ + P (||(EforCfar)|| < m/2) . 

\tGT c / 

For the second term to be less than S, we choose a such that 

a 2 = 7- 1 \og(n/S), 
and assume this value from now on. The first term is less than 5 if 

^ >21og(2n/<5). (3.19) 

Suppose n\T\ > y/m. The condition in Lemma 13.21 is a < (m/fi 2 ) 1 ^ or equivalently 

m>li 2 r 2 [log(n/5)] 2 , 



where 7 is a numerical constant. In this case, aa < [i\Jm \T\ which gives 

11m , 

Suppose now that ji \T\ < \fm. Then if \T\ > a 2 , aa < [i\J m \T\ which gives again (j3.20p . On the 
other hand if |T| < a, A < 4aa/m and 

1 1 m 

> 



A 2 16 a 2 fj, 2 
To verify (|3.19p . it suffices to take m obeying 



Tfl ( I I 

— min _ _) >21og(2n/5). 



16 fi 2 \\T\ a 
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This analysis shows that the second term is less than 5 if 

m > K\ /J, 2 max(|T|, log(n/<$)) log(n/<5) 
for some constant K\. Finally, by Theorem 11.21 the last term will be bounded by 5 if 

m > K 2 fJ 2 \T\ \og(n/S) 

for some constant K 2 - In conclusion, we proved that there exists a constant K3 such that the 
reconstruction is exact with probability at least 1 — 5 provided that the number of measurements 
m obeys 

m > K3 fi 2 max(|T|,log(n/<5)) log(n/<5). 

The theorem is proved. 

4 Discussion 

It is possible that a version of Theorem 11.11 exists that holds for all sign sequences on a set T 
simultaneously, i.e. we can remove the condition that the signs are chosen uniformly at random. 
Proving such a theorem with the methods above would require showing that the random vector 
w° = (UQrpUQT) -1 ^ , where v° is as in (|3.12p . will not be aligned with the fixed sign sequence 
z. We conjecture that this is indeed true, but proving such a statement seems considerably more 
involved. 

The new large-deviation inequality of Theorem 11.21 can also be used to sharpen results presented 
in [3] about using i\ minimization to find the sparsest decomposition of a signal in a union of 
bases. Consider a signal / G M n that can be written as a sparse superposition of the columns of a 
dictionary D = (^i ^2) where each is an orthonormal basis. In other words / = Dx°, where 
x° £ R 2ra has small support. Given such an /, we attempt to recover x° by solving 

min \\x\\t, subject to Dx = f. (4-1) 

X 

Combining Theorem 11.21 with the methods used in [3], we can establish that if 

Tl 

I suppxl < Const • . - - - , — : , 

then the following will be true with high probability (where the support and signs of x° are drawn 
at random): 

1. There is no x 7^ x° with |suppx| < |suppx°| with / = Dx. That is, x° is the sparsest 
possible decomposition of /. 

2. We can recover x° from / by solving (|4.1|) , 

This is a significant improvement over the bounds presented in [3], which have logarithmic factors 
of (logn) 6 . 
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